Restructuring Compressed Texts without Explicit Decompression

نویسندگان

  • Keisuke Goto
  • Shirou Maruyama
  • Shunsuke Inenaga
  • Hideo Bannai
  • Hiroshi Sakamoto
  • Masayuki Takeda
چکیده

We consider the problem of restructuring compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string T produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. These are the first algorithms that achieve running times polynomial in the size of the compressed input and output representations of T . Since most of the representations we consider can achieve exponential compression, our algorithms are theoretically faster in the worst case, than any algorithm which first decompresses the string for the conversion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compressed Pattern Matching for SEQUITUR

Sequitur due to Nevill-Manning and Witten. [18] is a powerful program to infer a phrase hierarchy from the input text, that also provides extremely effective compression of large quantities of semi-structured text [17]. In this paper, we address the problem of searching in Sequitur compressed text directly. We show a compressed pattern matching algorithm that finds a pattern in compressed text ...

متن کامل

Direct Pattern Matching on Compressed Text

We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential pattern matching algorithm. Approximate search can also be done ee-ciently without any decoding. The compression scheme uses a semi-static word-based modeling and a Huu-man coding where the coding alpha...

متن کامل

Processing Compressed Texts: A Tractability Border

What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length....

متن کامل

Using Inverted Files to Compress Text

This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed by original dictionary compression method that gives better compression...

متن کامل

Reducing Code Size with Run-Time Decompression

Compressed representations of programs can be used to improve the code density in embedded systems. Several hardware decompression architectures have been proposed recently. In this paper, we present a method of decompressing programs using software. It relies on using a softwaremanaged instruction cache under control of the decompressor. This is achieved by employing a simple cache management ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1107.2729  شماره 

صفحات  -

تاریخ انتشار 2011